Association of phenotypic variation with common genetic variants across the genome (single nucleotide polymorphisms, SNPs)
| id | outcome (\(y\)) |
|---|---|
| 1 | 1.4 |
| 2 | -3.2 |
| 3 | 1.5 |
| .. | … |
| N | -3.1 |
\(\sim\)
| id | SNP 1 (\(X_1\)) | SNP 2 (\(X_2\)) | … | SNP M (\(X_M\)) |
|---|---|---|---|---|
| 1 | 0 | 2 | … | 1 |
| 2 | 1 | 1 | … | 0 |
| 3 | 1 | 0 | … | 2 |
| .. | … | … | … | … |
| N | 2 | 1 | … | 1 |
Agnostic search & the simplest strategy. For every SNP \(i\):
Statistical power depends on Non-Centrity Paramepter (NCP) of a test for SNP with given MAF (\(p\)), effect size (\(\beta\)) in \(N\) unrelated individuals and using linear models (LM):
\[NCP = N q^2 = N \beta^2 [2 p (1 - p)]\]
Figure source: https://www.ebi.ac.uk/gwas/diagram
\(var(y) = \sum{\sigma_i^2 R_i} + \sigma_r^2 I\) ⟶ LMM
| Unrelated Individuals (LM) | Related Individuals (LMM) |
|---|---|
| \(y \sim (\mu + \beta x , \mbox{ } \sigma^2_r I)\) | \(y \sim (\mu + \beta x , \mbox{ } \sum{\sigma_i^2 R_i} + \sigma_r^2 I)\) |
| \(NCP = N q^2 = N \beta^2 [2 p (1 - p)]\) | \(NCP =\) ? |
| Unrelated Individuals (LM) | Related Individuals (LMM) |
|---|---|
| \(y \sim (\mu + \beta x + d \alpha + \delta \mbox{ } x*d , \mbox{ } \sigma^2_r I)\) | \(y \sim (\mu + \beta x + d \alpha + \delta \mbox{ } x*d , \mbox{ } \sum{\sigma_i^2 R_i} + \sigma_r^2 I)\) |
| \(NCP = N q^2 w^2= N \delta^2 [2 p (1 - p) f (1 - f)]\) | \(NCP =\) ? |
Related individuals
(Genetically) Unrelated individuals
| Generative model | Association model |
|---|---|
| \(y \sim (\mu + \sum_{i=1}^{M} \beta_i X_i, \sigma^2_r I)\) \(\beta_i \sim \mathcal{N}(0, \sigma^2_{\beta})\) | \(y \sim (\mu + \beta_k X_k, \sigma_m^2 M_{-k} + \sigma^2_r I)\) \(M_{-k} = X X^T / (M-1)\) |
Which study design is more powerfull?
(for a given SNP with MAF (\(p\)), effect size (\(\beta\)) and sample size \(N\))
\(y \sim (\mu + \beta x, \sum{\sigma_i^2 R_i} + \sigma_r^2 I) = (\mu + \beta x, V)\)
\(\hat{V} = \sum{\hat{\sigma}_i^2 R_i} + \hat{\sigma}_r^2 I\)
\(\hat{\beta} = (x^T \hat{V}^{-1} x)^{-1} x^T \hat{V}^{-1} y\)
\(var(\hat{\beta}) = (x^T \hat{V}^{-1} x)^{-1}\)
\(y\) and \(x\): centered
\(NCP = \hat{\beta}^2 / var(\hat{\beta}); var(\hat{\beta}) = [x^T \hat{V}^{-1} x]^{-1} \approx [E(x^T \hat{V}^{-1} x]^{-1} \approx [trace(\hat{V}^{-1} \Sigma_x)]^{-1}\)
Approximation using the quadratic forms
If \(x\) is a vector of random variables, the quadratic form \(x^TAx\) is a scalar random variable.
If \(x\) has mean \(\mu\) and (nonsingular) covariance matrix \(V\), then
\(E(x^TAx) = tr(AV) + \mu^T A \mu\)
\(\sigma^2(x^TAx) = 2tr(AVAV) + 4\mu AVA \mu\)
(Lynch and Walsh, 1998)
For unrelated individuals & testing main genetic effect:
\(\hat{V} = \hat{\sigma}_r^2 I\) and \(\Sigma_x = \sigma_x^2 I = 2 p (1 - p) I\)
\(x\): centered
\(trace\): trace operator
\(\Sigma_x\): genetic relationship matrix across individuals
| Study | Model \(y \sim (X \beta , V)\) | Genotype \(x_g\) | \(NCP\) | TF |
|---|---|---|---|---|
| Unrelated | \(y \sim (\mu + \beta_g x_g, \sigma_r^2 I)\) | \(x_g \sim (\mu_g, \sigma_g^2 I)\) | \([N / \sigma_r^2] \beta_g^2 \sigma_g^2\) | 1 |
| Families | \(y \sim (\mu + \beta_g x_g, \sigma_k^2 K + \sigma_r^2 I)\) | \(x_g \sim (\mu_g, \sigma_g^2 K)\) | \([tr((\sigma_k^2 K + \sigma_r^2 I)^{-1} K)] \beta_g^2 \sigma_g^2\) | 0.8253 |
| Study | Model \(y \sim (X \beta , V)\) | Genotype \(x_g\) | \(NCP\) | TF |
|---|---|---|---|---|
| Unrelated | \(y \sim (\mu + \beta_g x_g, \sigma_r^2 I)\) | \(x_g \sim (\mu_g, \sigma_g^2 I)\) | \([N / \sigma_r^2] \beta_g^2 \sigma_g^2\) | 1 |
| Unrelated +Grouping | \(y \sim (\mu + \beta_g x_g, \sigma_h^2 H + \sigma_r^2 I)\) | \(g \sim (\mu_g, \sigma_g^2 I)\) | \([tr((\sigma_h^2 H + \sigma_r^2 I)^{-1})] \beta_g^2 \sigma_g^2\) | >1 |
| Study | Model \(y \sim (X \beta , V)\) | Genotype \(x_g\) | \(NCP\) | TF |
|---|---|---|---|---|
| Unrelated | \(y \sim (\mu + ... + \delta x_{int}, \sigma_r^2 I)\) | \(x_{int} \sim (\mu_{int}, \sigma_{int}^2 I)\) | \([N / \sigma_r^2] \delta^2 \sigma_g^2 \sigma_d^2\) | 1 |
| Families | \(y \sim (\mu + ... + \delta x_{int}, \sigma_k^2 K + \sigma_r^2 I)\) | \(x_{int} \sim (\mu_{int}, \sigma_{int}^2 K_{int})\) | \([tr((\sigma_k^2 K + \sigma_r^2 I)^{-1} K_{int})] \delta^2 \sigma_g^2 \sigma_d^2\) | 1.003-8.043 |
Testing main genetic effect (GWAS)
Testing GxE interaction effect (GWAI)
Applications